Growth of biotechnology has accelerated particularly during the last decade due to accumulation of vast sequence and structure information as a result of sequencing of genomes and solving of crystal structures. This, coupled with advances in information technology, has made biotechnology increasingly dependent on computationally intensive approaches. This has led to the emergence of a super-speciality discipline, called bioinformatics.
Bioinformatics has become a frontline applied science and is of vital importance to the study of new biology, which is widely recognised as the defining scientific endeavour of the twenty-first century. The genomic revolution has underscored the central role of bioinformatics in understanding the very basics of life processes.
Advancing beyond the genome project
With the completion of sequencing of human genome and a number of other genomes from a variety of organisms, the main challenge before the bioinformatician is to analyse and interpret the information. In this regard, the following issues are of particular importance: -
- High Throughput Genome Assembly
- Annotation
- Comparative Modeling and Assignment of Protein Folds
- Prediction of Structure, Dynamics and Thermodynamics
- Advanced biotechnological applications e.g. drug design, gene therapy etc.
- System modeling
Applications of Bioinformatics
Present day application of Bioinformatics is diverse. It ranges from studies of evolution of life on Earth to generation of designer drugs. Sequence analysis focus upon the finding of new genes, analyse structure of the gene to determine its function and correlate how an altered structure of a gene can be linked with diseases. Molecular modeling studies attempts to understand how the three-dimensional topography of a protein is related to its function. Other complex applications include modeling of cell signaling and metabolic pathways, studying protein-protein interaction, understand mechanisms how protein families evolve and map the expression pattern of a plethora of genes in different cells and tissues.
The Genome Databases
In the international vista, Bioinformatics is progressing at an astonishing rate. The major thrust areas seem to be the acquisition of sequence data, incorporating them in form of classified databases, integrating sequence information with structure data, developing tools for effective data-mining and developing a common platform for resource sharing and integration. In this last area, the outcome had been the International Nucleotide Sequence Database Collaboration, involving the three major genome databases of the world, GenBank, EMBL and DDBJ.
The INSDC comprise of:
1. The taxonomy project - for using a uniform taxonomy for all databases
2. The Feature table - giving shared rules to allow the database to be exchanged among the three organizations
3. db_xref qualifier - explicit referencing of specific sequence within the database
4. Country qualifier - country of origin of the sequence.
In order to achieve the multitude of objectives on the global scale, growth of public domain databases have been particularly instrumental. Today we can retrieve an unimaginable amount of information on virtually any aspect of cellular biology from the Internet, whether it is bibliographic or genomic or structural or even functional. This information acts as the seed data for valuable downstream research. The public domain, apart from providing the basic information, also fortunately provides the utilities required for analysing and interpreting such data. It ranges from multiple sequence alignment to virtual gene expression studies, electronic PCR and so on.
The Many Faces of Genome Analysis
The availability of genome information provides the bioinformatician with a new set of challenges. This is to analyse the seemingly fragmentary body of knowledge. Currently, the predominant areas of Bioinformatics data analysis include:
- Sequence Alignment Studies
- Prediction of Protein Structure
The two areas have within them the commonalties and differences that one might expect.
Sequence Analysis Studies: - Sequence analysis studies are of two major types namely pairwise alignment e.g. one that is experienced through BLAST searches and multiple alignment studies e.g. one experienced with programmes like Clustal. In all cases the idea is to find the similarity or differences between a set of sequences and attempting to infer how they arose and how they are changing. Sequence analysis is an extraordinary tool for studying evolutionary relationships among genomes, gene duplication, splicing and so on.
Protein Structure Prediction: - Predicting the structure of a protein from the sequence information is one of the most exciting areas of Bioinformatics. The crux of the process lies in the assignment of folds and domains in the primary structure, thereby developing an acceptable model of tertiary and quaternary level structures. Lately, several methods are available e.g. comparative modeling, threading methods, ab initio methods and use of genetic algorithms.
Mining from the Genome - In Trail of the Treasure
A well-equipped Bioinformatics laboratory in place, the fortunes to follow are stupendous. Euphemistically called genome data mining, the method uses traditional methods like sequence similarity studies, multiple alignment etc. coupled with more complex developments such as Serial Analysis of Gene Expression [SAGE], electronic PCR, microarray informatics and so on. Some of the areas of application include:
- Gene identification
- Drug discovery
- Phylogenomics
- Detection of genomic markers and polymorphism
- Understanding gene expression profiles
- Exploring new metabolic and regulatory pathways
- Assigning functions to unknown ESTs
- Understanding protein-protein interactions.
High Performance Computing - Adding Tooth to a Blunted Knife
With the exponential growth of genome information and the need to accomplish quick analysis, the once extraordinary power of serial computation is beginning to falter. Given the present sizes of the known genomes, an operation such as whole genome comparison is likely to take unacceptably long times. This constraint has resulted in the evolution of the concept of parallel computation. Parallelisation efficiently scales up the process by cutting down computation time.
The Indian Side of Bioinformatics
India embarked upon major national initiative for Bioinformatics studies. Leading the bandwagon is the Department of Biotechnology''s Biotechnology Information System of India. Spread out as a distributed resource across the country with about 61 centres, the network is expected to bolster the Indian attempt to harness the deluge of biological information. Other programmes include the High Performance Computing Initiatives at the Centre for Development of Advanced Computing, Pune, where parallel computing is used to address problems of evolutionary biology, large-scale genome comparison and biological system modeling.Super computing facility has been established at IIT, Delhi for the promotion of in-silico drug development.
The Biotechnology Information System Network
Structure: - The network comprises of 10 Distributed Information Centres [DICs] and 50 Distributed Information Sub-Centres [DISCs]. The entire system is headed and coordinated by an Apex Biotechnology Information Centre at the DBT Headquarters in New Delhi.
Activities: - The major activities of the Bioinformatics Centres are:-
- To provide a national bioinformation network to cover the diverse areas of the multidisciplinary areas of biotechnology
- To develop information resources, develop databases, information handling tools and techniques
- To establish information linkages with the international organizations
- To evolve programmes of education and implement human resource development in Bioinformatics
- To undertake research and development activities in the field of Bioinformatics.
State-of-the Art: Aided by a highly sophisticated communication backbone and six Interactive Graphics Facilities for Molecular modeling the network has been instrumental in the development of important databases and software. Four long term Advanced Diploma Courses in Bioinformatics at the Post- M.Sc. level is currently operational at Madurai, Pune, Calcutta and New Delhi for increasing the production of trained personnel in the area. The network also maintains an array of mirror sites of some of the major public domain databases for the benefit of practicing scientists.M.Sc, M.Tech and Ph.D. programmes in Bioinformatics have also been introduced recently. As we gear up for the post-genomic era, the BTISnet is poised to play an increasingly crucial role.u
--- The author is Director, (Bioinformatics) Department of Biotechnology, Govt. of India, New Delhi